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DETAILED ACTION 

Response to Arguments 

1 . Applicant's arguments, see Remarks pages 9 and 10, filed 10/21/2008, with 
respect to the rejection(s) of claim(s) 1,13, 15, 27, 29, and 30 under 35 USC 103(a) 
have been fully considered and are persuasive. Therefore, the rejection has been 
withdrawn. However, upon further consideration, a new ground(s) of rejection is made 
in view of Marcu, Daniel et al. US 2002004601 8 A1 (hereinafter Marcu) in view of Lee et 
al. US 6088673 A (hereinafter Lee). Further, in response to the telephonic interview 
and Remarks, Binnig et al. US 6792418 B1 (hereinafter Binnig) has been withdrawn. 
Though Shriberg teaches discourse analysis, prosodic feature identification, and briefly 
mentions speech synthesis, Examiner concurs that Shriberg in view of Binning does 
not specifically teach or suggest determining adjusted synthesized speech output and 
input text. 

Claim Rejections - 35 USC § 101 

2. 35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of 
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the 
conditions and requirements of this title. 

Claims 1-14 are rejected under 35 U.S.C. 101 because: 
Claims 1 and 13 fail to clearly recite a statutory process to which it is tied. Claims 1 and 
13 recite purely mental steps and would not qualify as a statutory process. In order to 
qualify as a statutory process, the method claim should positively recite the other 
statutory class to which it is tied (i.e. apparatus, device, product, etc.). For example, the 
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method steps of claim 1 appear to recite mental steps and do not identify an apparatus 
that performs the recited method steps, such as a telephone system or audio device as 
described in the specification (present invention [0028]). 

As per claim 29, the language "a carrier wave encoded to transmit a control 
program" does not transform the claimed subject matter into statutory subject matter. 

NOTE: 

Claims that recite nothing but the physical characteristics of a form of energy, 
such as a frequency, voltage, or the strength of a magnetic field, define energy or 
magnetism, per se, and as such are nonstatutory natural phenomena. O'Reilly, 56 U.S. 
(15 How.) at 112-14. Moreover, it does not appear that a claim reciting a signal encoded 
with functional descriptive material falls within any of the categories of patentable 
subject matter set forth in § 1 01 . 

First, a claimed signal is clearly not a "process" under § 101 because it is not a 
series of steps. The other three § 101 classes of machine, compositions of matter and 
manufactures "relate to structural entities and can be grouped as 'product' claims in 
order to contrast them with process claims." 1 D. Chisum, Patents § 1 .02 (1994). The 
three product classes have traditionally required physical structure or material. 

"The term machine includes every mechanical device or combination of 
mechanical device or combination of mechanical powers and devices to perform some 
function and produce a certain effect or result." Corning v. Burden, 56 U.S. (15 How.) 
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252, 267 (1854). A modern definition of machine would no doubt include electronic 
devices which perform functions. Indeed, devices such as flip-flops and computers are 
referred to in computer science as sequential machines. A claimed signal has no 
physical structure, does not itself perform any useful, concrete and tangible result and, 
thus, does not fit within the definition of a machine. 
55 

A "composition of matter" "covers all compositions of two or more substances 
and includes all composite articles, whether they be results of chemical union, or of 
mechanical mixture, or whether they be gases, fluids, powders or solids." Shell 
Development Co. v. Watson, 149 F. Supp. 279, 280, 113 USPQ 265, 266 (D.D.C. 
1 957), affd, 252 F.2d 861 , 1 1 6 USPQ 428 (D.C. Cir. 1 958). A claimed signal is not 
matter, but a form of energy, and therefore is not a composition of matter. 
The Supreme Court has read the term "manufacture" in accordance with its dictionary 
definition to mean "the production of articles for use from raw or prepared materials by 
giving to these materials new forms, qualities, properties, or combinations, whether by 
hand-labor or by machinery." Diamond v. Chakrabarty, 447 U.S. 303, 308, 206 USPQ 
193, 196-97 (1980) (quoting American Fruit Growers, Inc. v. Brogdex Co., 283 U.S. 1, 
11,8 USPQ 1 31 , 1 33 (1 931 ), which, in turn, quotes the Century Dictionary). Other 
courts have applied similar definitions. See American Disappearing Bed Co. v. 
Arnaelsteen, 182 F. 324, 325 (9th Cir. 1910), cert, denied, 220 U.S. 622 (1911). These 
definitions require physical substance, which a claimed signal does not have. Congress 
can be presumed to be aware of an administrative or judicial interpretation of a statute 
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and to adopt that interpretation when it re-enacts a statute without change. Lorillard v. 
Pons, 434 U.S. 575, 580 (1978). Thus, Congress must be presumed to have been 
aware of the interpretation of manufacture in American Fruit Growers when it passed 
the 1952 Patent Act. 

A manufacture is also defined as the residual class of product. 1 Chisum, § 
1 .02[3] (citing W. Robinson, The Law of Patents for Useful Inventions 270 (1890)). 
56 

A product is a tangible physical article or object, some form of matter, which a 
signal is not. That the other two product classes, machine and composition of matter, 
require physical matter is evidence that a manufacture was also intended to require 
physical matter. A signal, a form of energy, does not fall within either of the two 
definitions of manufacture. Thus, a signal does not fall within one of the four statutory 
classes of § 101. 

On the other hand, from a technological standpoint, a signal encoded with 
functional descriptive material is similar to a computer-readable memory encoded with 
functional descriptive material, in that they both create a functional interrelationship with 
a computer. In other words, a computer is able to execute the encoded functions, 
regardless of whether the format is a disk or a signal. 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 
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(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 1-30 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
"Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational 
Speech?" (hereinafter Shriberg) in view of Marcu et al. US 20020046018 A1 (hereinafter 
Marcu) and further in view of Lee et al. US 6088673 A (hereinafter Lee). 

Re claims 1,15, 29, and 30, Shriberg teaches a method of synthesizing speech 
(Page 5) using discourse function level prosodic features (Pages 14-18) comprising the 
steps of: 

determining discourse functions in the input text the discourse functions being 
determined based on a mapping between basic discourse constituents of the 
determined theory of discourse analysis and a plurality of discourse functions (Pages 8- 
13); 

determining a model of discourse function level prosodic features (Pages 14-18); 

However, Shriberg fails to teach determining a theory of discourse analysis from 
plurality of theories of discourse analysis; 

Marcu teaches a channel-based summarization process 1700 is to receive the 
input text. Although the embodiment described above uses sentences as the input text, 
any other text segment could be used instead, for example, clauses, paragraphs, or 
entire treatises. Next, in step 1704, the input text is parsed to produce a syntactic tree 
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in the style of FIG. 11, which is used in step 1 706 as the basis of generating multiple 
possible solutions (e.g., the shared-forest structure described above). If a whole text is 
given as input, the text can be parsed to produce a discourse tree, and the algorithm 
described here will operate on the discourse tree (Marcu [0220-0221]). 

Further, Marcu teaches a discourse structure for an input text segment (e.g., a 
clause, a sentence, a paragraph or a treatise) is determined by generating a set of one 
or more discourse parsing decision rules based on a training set, and determining a 
discourse structure for the input text segment by applying the generated set of 
discourse parsing decision rules to the input text segment (Marcu [0010]). 

Furthermore, Marcu teaches generating the set of discourse parsing decision 
rules may include iteratively performing one or more operations (e.g., a shift operation 
and one or more different types of reduce operations) on a set of edus to incrementally 
build the annotated text segment associated with the set of edus. The different types of 
reduce operations may include one or more of the following six operations: reduce-ns, 
reduce-sn, reduce-nn, reduce-below-ns, reduce-below-sn, reduce-below-nn. The six 
reduce operations and the shift operation may be sufficient to derive the discourse tree 
of any input text segment (Marcu [0012]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Shriberg to incorporate determining a 
theory of discourse analysis from plurality of theories of discourse analysis as taught by 
Marcu to allow for the proper rules to analyze input text, wherein the type of input 
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(phrases, sentences, words, etc.) determine how to determine the structure of text such 
as rhetorical analysis (Marcu [0012]). 

However, Shriberg in view of Marcu fail to teach 
determining input text; 

determining adjusted synthesized speech output based on the discourse 
functions, the model of discourse function level prosodic features (pages 14-18), and 
the input text 

Lee teaches a TTS for interlocking with multimedia according to the present 
invention comprises a multimedia information input unit for organizing text, prosody, the 
information on synchronization with moving picture, lip-shape, and the information such 
as individual property; a data distributor by each media for distributing the information of 
the multimedia information input unit into the information by each media; a language 
processor for converting the text distributed by the data distributor by each media into 
phoneme stream, presuming prosody information and symbolizing the information; a 
prosody processor for calculating a value of prosody control parameter from the 
symbolized prosody information using a rule and a table; a synchronization adjuster for 
adjusting the duration of the phoneme using the synchronization information distributed 
by the data distributor by each media; a signal processor for producing a synthesized 
speech using the prosody control parameter and data in a synthesis unit database; and 
a picture output apparatus for outputting the picture information distributed by the data 
distributor by each media onto a screen (Lee Col. 2 lines 29-49 & Fig. 1). 
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Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Shriberg in view of Marcu to incorporate 
determining input text and determining adjusted synthesized speech output based on 
the discourse functions, the model of discourse function level prosodic features, and the 
input text as taught by Lee to allow for the proper rules to analyze input text, wherein 
prosody control is established such as phonemic features of text in order to modify 
output speech synthesis to adapt in a multilingual environment (Lee Col. 2 lines 29-49 & 
Fig. 1). 

Re claims 2 and 16, Shriberg teaches the method of claim 1, wherein the 
discourse functions are determined based on the determined theory of discourse 
analysis (Pages 8-13). 

Re claims 3 and 17, Shriberg fails to teach the method of claim 2, in which the 
theory of discourse analysis is at least one of: the Linguistic Discourse Model, the 
Unified Linguistic Discourse Model, Rhetorical Structures Theory, Discourse Structure 
Theory and Structured Discourse Representation Theory; 

Re claims 4 and 18, Shriberg teaches the method of claim 1 , wherein the output 
information (Pages 4-5, Why Use Prosody?) is at least one of text information and 
application output information (Pages 8-13). 
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Re claims 5 and 19, Shriberg teaches the method of claim 1, wherein 
determining the adjusted synthesized speech output (Pages 4-5, Why Use Prosody?) 
further comprises the steps of: 

determining discourse function level prosodic feature adjustments (Pages 14-18); 

However, Shriberg fails to teach determining input text; 

determining the adjusted synthesized speech output based on the synthesized 
speech output and the discourse level prosodic feature adjustments 

Lee teaches a TTS for interlocking with multimedia according to the present 
invention comprises a multimedia information input unit for organizing text, prosody, the 
information on synchronization with moving picture, lip-shape, and the information such 
as individual property; a data distributor by each media for distributing the information of 
the multimedia information input unit into the information by each media; a language 
processor for converting the text distributed by the data distributor by each media into 
phoneme stream, presuming prosody information and symbolizing the information; a 
prosody processor for calculating a value of prosody control parameter from the 
symbolized prosody information using a rule and a table; a synchronization adjuster for 
adjusting the duration of the phoneme using the synchronization information distributed 
by the data distributor by each media; a signal processor for producing a synthesized 
speech using the prosody control parameter and data in a synthesis unit database; and 
a picture output apparatus for outputting the picture information distributed by the data 
distributor by each media onto a screen (Lee Col. 2 lines 29-49 & Fig. 1). 
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Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Shriberg in view of Marcu to incorporate 
determining input text and determining the adjusted synthesized speech output based 
on the synthesized speech output and the discourse level prosodic feature adjustments 
as taught by Lee to allow for the proper rules to analyze input text, wherein prosody 
control is established such as phonemic features of text in order to modify output 
speech synthesis to adapt in a multilingual environment (Lee Col. 2 lines 29-49 & Fig. 
1)- 

Re claims 6 and 20, Shriberg teaches the method system of claim 1 , wherein the 
model of discourse function level prosodic features (Pages 14-18) is a predictive model 
of discourse functions (Page 19). 

Re claims 7 and 21 , Shriberg teaches the method of claim 6, in which the 
predictive models are determined based on at least one of: machine learning and rules 
(Page 19). 

Re claims 8 and 22, Shriberg teaches the method of claim 1 , in which the 
prosodic features occur in at least one of a location: preceding, within and following the 
associated discourse function (Page 14). 
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Re claims 9 and 23, Shriberg teaches the method of claim 1 , in which the 
prosodic features are encoded within a prosodic feature vector. 

Re claims 10 and 24, Shriberg teaches the method of claim 9, in which the 
prosodic feature vector is a multimodal feature vector (Pages 14-18 & Table 10). 

Re claims 1 1 and 25, Shriberg teaches the method of claim 1 , in which the 
discourse functions include an intra-sentential discourse function (Page 8 & Table 1). 

Re claims 1 2 and 26, Shriberg teaches the method of claim 1 , in which the 
discourse functions include an inter-sentential discourse function (Page 8 & Table 1). 

Re claim 13, Shriberg teaches a method of synthesizing speech using discourse 
function level prosodic features comprising the steps of: 

determining discourse functions in the input text based on a contextually aware 
theory of discourse analysis using a mapping between basic discourse constituents of 
the contextually aware theory of discourse analysis and a plurality of discourse 
functions (Pages 8-13); 

determining a model of discourse function level prosodic features (Pages 14-18); 

However, Shriberg fails to teach determining input text; 
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determining adjusted synthesized speech output based on the discourse 
functions and the model of discourse function level prosodic features (pages 14-18) 

Lee teaches a TTS for interlocking with multimedia according to the present 
invention comprises a multimedia information input unit for organizing text, prosody, the 
information on synchronization with moving picture, lip-shape, and the information such 
as individual property; a data distributor by each media for distributing the information of 
the multimedia information input unit into the information by each media; a language 
processor for converting the text distributed by the data distributor by each media into 
phoneme stream, presuming prosody information and symbolizing the information; a 
prosody processor for calculating a value of prosody control parameter from the 
symbolized prosody information using a rule and a table; a synchronization adjuster for 
adjusting the duration of the phoneme using the synchronization information distributed 
by the data distributor by each media; a signal processor for producing a synthesized 
speech using the prosody control parameter and data in a synthesis unit database; and 
a picture output apparatus for outputting the picture information distributed by the data 
distributor by each media onto a screen (Lee Col. 2 lines 29-49 & Fig. 1). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Shriberg in view of Marcu to incorporate 
determining input text and determining adjusted synthesized speech output based on 
the discourse functions and the model of discourse function level prosodic features as 
taught by Lee to allow for the proper rules to analyze input text, wherein prosody control 
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is established such as phonemic features of text in order to modify output speech 
synthesis to adapt in a multilingual environment (Lee Col. 2 lines 29-49 & Fig. 1). 

Re claims 14 and 28, Shriberg teaches the method of claim 13, in which the 
context is at least one of: semantic, pragmatic, and syntactic context (pages 4-5). 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Michael C. Colucci whose telephone number is (571)- 
270-1847. The examiner can normally be reached on 9:30 am - 6:00 pm, Monday- 
Friday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571)-272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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